perm filename HEURS[DIS,DBL]4 blob sn#221893 filedate 1976-06-29 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00021 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	.NSECP(Heuristic Rules)
C00006 00003	.SSEC(Syntax of the Heuristics)
C00009 00004	. SSSEC(Syntax of the Left-hand Side)
C00016 00005	. SSSEC(Syntax of the Right-hand Side)
C00019 00006	.SSEC(Heuristics Suggest New Tasks)
C00020 00007	. SSSEC(An Illustration: α"Fill in Generalizations of Equalityα")
C00029 00008	. SSSEC(The Ratings Game)
C00038 00009	.SSEC(Heuristics Create New Concepts)
C00040 00010	. SSSEC(An Illustration: Discovering Primes)
C00049 00011	. SSSEC(The Theory of Creating New Concepts)
C00055 00012	. SSSEC(Another Illustration: Squaring a number)
C00062 00013	.SSEC(Heuristics Fill in Entries for a Specific Facet)
C00066 00014	. SSSEC(An Illustration: α"Fill in Examples of Set-unionα")
C00074 00015	. SSSEC(Heuristics Propose New Conjectures)
C00079 00016	. SSSEC(An Illustration: α"All primes except 2 are oddα")
C00087 00017	. SSSEC(Another illustration: Discovering Unique Factorization)
C00097 00018	.SSEC(Gathering Relevant Heuristics)
C00099 00019	. SSSEC(Domain of Applicability)
C00105 00020	. SSSEC(Rippling)
C00111 00021	. SSSEC(Ordering the Relevant Heuristics)
C00118 ENDMK
C⊗;
.NSECP(Heuristic Rules)

Assume that somehow AM has selected a particular task from the agenda
-- say "⊗6Fill-in  Examples of Primes⊗*".  What precisely does AM do,
in order to execute the  task? How ⊗4are⊗* examples of primes  filled
in?

The answer can  be compactly stated as follows:  

.ONCE PREFACE 0 INDENT 4 SELECT 4

"AM selects relevant heuristics, and executes them."

This really just  splits our original question into two new ones: (i)
How are the relevant heuristics selected, and (ii) What  does it mean
for heuristics  to be executed (e.g., how  does executing a heuristic
rule help to fill in examples of primes?).

.ONCE TURN ON "{}"

These two topics (in reverse order) are the two major subjects of this
chapter. Although several examples of heuristics will be given, the
complete list is relegated to Appendix {[2] ALLCON}.$$
There they are condensed and phrased in English.
The reader
wishing to see examples of the heuristics as they actually were coded in
LISP should glance at Appendix {[2] CONS}. $

The first section explains what heuristic  rules look  like
(their "syntax", as it were). The next three sections 
illustrate how they  can be executed to achieve
their  desired  results (their  "semantics").  The final section
explains where they are stored and how they are accessed at the
appropriate times. 

.HEURS: SECNUM  ;
.SSEC(Syntax of the Heuristics)

Let's  start by  reviewing  what a  heuristic rule  looks  like.   In
general, a heuristic rule has the form

.B816

If ⊗4<situational fluent>⊗*

Then ⊗4<actions>⊗*

.E

As an  illustration, here is a heuristic  rule, relevant when checking examples
of anything:

.GROUP B816 ONCE INDENT 4 COMPACT

If the current task is to Check Examples of any concept X,

and (Forsome Y) Y is a generalization of X,

and Y has at least 10 examples,

and all examples of Y are also  examples of X,

.ONCE INDENT 4

Then print the following conjecture:  X is really no more specialized
than Y,

and add it to the Examples facet of the concept named "Conjectures",

and 
add the following task to the agenda:
"Check  examples of Y", for the  reason:
"Just as Y was no more general  than X, one-of Generalizations(Y) may
turn out to be no more general than Y", with a rating for that reason
computed as the average of: ||Examples(Generalizations(Y))||,
||Examples(Y)||, and Priority(Current task).

.E APART

As with production rules, 
and formal grammatical rules, each of AM's heuristic rules
has a left-hand-side and a right-hand-side.
On the left is a test to see whether the rule is applicable, and on the right
is a list of actions to take if the rule applies.
The left-hand-side will also be called the IF-part, the predicate, the
left side, or the situational fluent of the rule.
The right-hand-side will sometimes be referred to as the THEN-part, the
right side, or the actions part of the rule.

. SSSEC(Syntax of the Left-hand Side)

The situational fluent is  a LISP predicate, a function  which always
returns True or False (in LISP, it actually returns either the atom T
or the  atom NIL).   This  predicate may  investigate  facets of  any
concept (often merely to see whether  they are empty or not), use the
results of recent tests and behaviors (e.g., to see how much cpu time
AM spent trying to work on a certain task), etc.

The left side  is a conjunction  of the form  P1 ∧ P2  ∧...  All  the
conjuncts, except the very  first one, are arbitrary LISP predicates.
They are only constrained to obey two commandments:

.BN

λλ ⊗4Be quick!⊗* (return either True or False in under 0.1 cpu seconds)

λλ ⊗4Have no side effects!⊗*  (destroying or creating list structures
or Lisp functions, resetting variables)

.E

Here are some sample  conjuncts that might appear inside  a left-hand
side (but ⊗4not⊗* as the very first conjunct):

.BN PREFACE 1

⊗8# ⊗3  More than half of the current  task's time quantum is already
exhausted,...

⊗8# ⊗3 There are some known examples of Structures,...

⊗8# ⊗3  Some  generalization  of  the current  concept  (the  concept
mentioned  as  part  of  the  current task)  has  an  empty  Examples
facet,...

⊗8#  ⊗3 The space quantum  of the current task is  gone, but its time
allocation is less than 10% used up,...

⊗8# ⊗3 A task recently  selected had the form ⊗6"Restructure  facet F
of  concept  X"⊗*,  where F  is  any  facet,  and  X is  the  current
concept,...

⊗8# ⊗3 The user has used this system at least once before,...

⊗8# ⊗3 It's Tuesday,...

.E

The very first conjunct of each left-hand side is special. Its syntax
is highly constrained.   It specifies the domain of  applicability of
the rule, by  naming a particular facet of a particular concept to which
this rule is relevant.

AM uses this first conjunct as a fast "pre-precondition", so that the
only rules whose  left-hand sides get evaluated are  already known to
be  somewhat relevant  to the  task at  hand. In fact,  AM physically
attaches each rule  to the facet and  concept mentioned in its  first
conjunct.$$  Sometimes,  I  will  mention  where a  certain  rule  is
attached; in that  case, I  can omit  explicit mention  of the  first
conjunct.  Conversely, if I include that conjunct, I needn't  tell you
where the rule is  stored. $ This will be discussed in more detail in
the final  section of  this chapter,  "Finding relevant  heuristics".
This first  conjunct will always be  written out as follows,  in this
document (where A, F, and C are specified explicitly):

.ONCE PREFACE 1 INDENT 4,4,4 SELECT 3

The  current task (the one  just selected from the  agenda) is of the
form ⊗6"Do action ↓_A_↓ to the ↓_F_↓ facet of concept ↓_C_↓"⊗*

This can be viewed as the "syntax" of the very first conjunct on each
rule's left-hand  side.  Here  are two typical  examples of allowable
first conjuncts:

.BN PREFACE 1

⊗8# ⊗3 The current task (the one last selected from the agenda) is of
the form ⊗6"Check the Domain/range facet of  concept X"⊗3, where X is
any operation,...

⊗8# ⊗3  The current task is of the  form ⊗6"Fillin the examples facet
of the Primes concept"⊗3,...

.E

These are the only guidelines which the left-hand side of a heuristic
rule  must   satisfy.  Any  LISP  predicate   which  satisfies  these
constraints  is a  syntactically  valid left-hand  side for  a heuristic
rule.
It turned out later that this excessive freedom made it difficult for
AM to inspect  and analyze and synthesize its  own heuristics; such a
need was not foreseen at the time AM was designed.

Because of this  freedom, there  is not much  more to  say about  the
left-hand sides of rules. As the reader  encounters heuristics in the
next  few sections,  he should  notice  the (unfortunate)  variety of
conjuncts which may occur as part of their predicates (left sides).

. SSSEC(Syntax of the Right-hand Side)

"Running" the left-hand-side means evaluating the series of conjoined
little  predicates there, to see  if they all return  True. If so, we
say that the  rule "triggers". In that  case, the right-hand-side  is
"run",  which means  executing all  the actions  specified there.   A
single  heuristic rule  may  have a  list of  several actions  as its
right-hand-side.  The actions are executed in order, and  we then say
the rule has finished running.

Only  the right-hand-side of  a heuristic  rule is permitted  to have
side effects.  The right  side of a rule is  a series of little  LISP
functions, each of which is called an ⊗4action⊗*.

Semantically,  each   action  performs   some  processing  which   is
appropriate  in some  way to  the  kinds of  situations in  which the
left-hand-side would have triggered.  The final value that the action
function returns is irrelevant.

Syntactically, there  is only one  constraint which each  function or
"action"  must  satisfy:  Each  action has  one  of  the  following 3
side-effects, and no other side-effects:

.BN

λλ It suggests a new task for the agenda.

λλ It causes a new concept to be created.

λλ It adds (or  deletes) a certain entry  to a particular facet  of a
particular concept.

.E

To repeat: the right side  of a rule contains a list of actions, each
of which is one of  the above three types.  A single rule might  thus
result in the creation of several new  concepts, the addition of many
new tasks  to the agenda, and  the filling in of  some facets of some
already-existing concepts.

These three kinds of actions  will now be discussed in the  following
three sections.

.SSEC(Heuristics Suggest New Tasks)

This section discusses the "proposing a new task" kind of action.

Here is the basic idea in a nutshell: The left-hand-side of a rule triggers.
Scattered  among  the "things  to do"  in its right-hand-side are 
some suggestions  for future  tasks. These
new tasks are  then simply  added to  the agenda  list.

.XGENLINES←XGENLINES-2;

.IF LINES≤10 THEN START NEXT PAGE END

. SSSEC(An Illustration: α"Fill in Generalizations of Equalityα")

If a new task is  suggested by a heuristic rule, then  that rule must
specify how to assemble  the new task, how to get reasons for it, and
how to  evaluate  those reasons.   For  example,  here is  a  typical
heuristic rule which proposes a new task to add to the agenda. It says
to generalize a predicate if it is very rarely$$
The most  suspicious part of the situational fluent (the IF-part) is the 
number ".05". Where
did it come from? Hint: if all humans had nine fingers, this would probably be
".0555..". Seriously, one can change this value (to .01 or to .25)
with virtually no change in AM's behavior. This is the conclusion of
experiment 4 (see Section {[2]EXPT}.{[2]EXPTSSEC}.4). Such empirical
justification is one
important reason for actually writing and running large programs like AM. 
$  satisfied:

.XGENLINES←XGENLINES-2;

.B816

.ONCE INDENT 4

If the current task was (Fill-in examples of X),

and X is a predicate,

and more than 100 items are known in the domain of X,

and at least 10 cpu seconds were spent trying to randomly instantiate
X,

and the ratio of successes/failures is both >0 and less than .05

.ONCE INDENT 4

Then  add the following task to  the agenda: (Fill-in generalizations
of X), for the following reason:

"X is rarely satisfied; a slightly less restrictive  concept might be
more interesting".

This  reason's  rating  is  computed  as  three times  the  ratio  of
nonexamples/examples found.

.E

Even this  is one  full step  above the  actual LISP  implementation,
where ⊗6"X is a predicate"⊗* would be coded as ⊗6"(MEMBER X (EXAMPLES
PREDICATE))"⊗*.  The function  ⊗6EXAMPLES(X)⊗* rummages  about looking
for already-existing  examples of  X.
Also, the LISP code contains information for normalizing all the numbers
produced, so that they will lie in the range 0-1000.

Let's examine an instance of where this rule was used. At some point,
AM chose the task ⊗6"Fillin examples of List-equality"⊗*.  One of the
ways it filled in examples of this  predicate was to run it on  pairs
of randomly-chosen lists, and observe whether the  result was True or
False$$ The True ones became examples of List-equality, and the pairs
of  lists  which  didn't  satisfy  this  predicate  became  known  as
non-examples (failures,  foibles,...).   A heuristic similar  to this
"random  instantiation" one is illustrated  in 
Section {SECNUM}.{[2]RANDISSEC}, on page {[3]RANDI}.
$.  Say that 244 random pairs of lists were tried, and only twice was
this predicate satisfied.  Sometime later, the IF part
of the above heuristic  is examined.  All the  conditions are  met, so  it
"triggers". 
For example, the "ratio of successes to failures" is just 2/242, which is
clearly greater than zero and less than 0.05.
So the  right-hand-side (THEN-part) of the above rule is  executed. 
The right-hand side initiates only one action:
the task
"⊗6Fillin generalizations of List-equality⊗*" is added to the agenda,
tagged with the reason "List-equality is rarely satisfied; a slightly
less restrictive concept might be  more interesting", and that reason
is assigned a numeric rating of 3x(242/2) = 363.

Notice  that the heuristic  rule above supplied a  little function to
compute  the  value  of  the  reason.
That formula was:
"three  times   the  ratio  of
examples/nonexamples found".$$
In actuality, this would be checked to ensure  that the
result lies between 0 and 1000. $
Functions  of this type, to compute the
rating  for  a   reason,  satisfy  the   same  constraints  as   the
left-hand-side did: the function  must be very fast and  it must have
no side effects. The "intelligence" that AM exhibits in selecting which
task to work on, ultimately depends on the accuracy of these local rule
evaluation formulae. Each one is so specialized that it is "easy" for it to
give a valid result; the range of situations it must judge is quite narrow.
Note that these little formulae were hand-written, individually, by the
author. AM wasn't able to create new little reason-rating formulae.

The  reason-rating function  is evaluated  at the  moment the  job is
suggested,  and only  the  numeric  result  is  remembered,  not  the
original function.  In other words, we  tack on a list of reasons and
associated  numbers,  for  each  job on  the  agenda.  The agenda ⊗4doesn't⊗*
maintain copies of the reason-rating functions which gave those  numbers. 
This simplification is used merely to save the system some space and time.

Let's turn now from the reason-rating formulae to the reasons themselves.
Each reason supporting  a newly-suggested
job is simply an English sentence (an  opaque string, a token). 
AM cannot do much intelligent processing on these reasons. 
AM is  not
allowed to inspect parts of it, parse it, transform it, etc. The most
AM  can do  is compare two  such tokens  for equality.  Of course, it
would not  be too  hard to  extend this  capability to  permit AM  to
syntactically analyze such strings, or to trivially compute some sort
of "difference" between two given reasons.
Each reason is assumed to have some semantic impact on the user, and is
kept around for that purpose.

Each
reason will have a numeric rating 
(a number between 0 and 1000)
assigned to it locally, by the heuristic rule which proposed the task
for that reason.
One global formula will then combine
all the reasons' ratings into one single priority value for the task.

. SSSEC(The Ratings Game)

In general, a  task on the agenda  list will have several  reasons in
support  of it.   Each  reason consists  of an  English phrase  and a
numeric rating.  How can a task have more than one reason?  There are
two  contributing  factors: (i)  A  single  heuristic rule  can  have
several reasons in support of a job it suggests, and (ii) When a rule
suggests a "new" task, that  very same task may already exist  on the
agenda, with quite  distinct reasons tacked on there.   In that case,
the new reason(s) are added to the already-known ones.

One global formula  looks at  all the  ratings for  the reasons,  and
combines them into a  single priority value for the task  as a whole.
Below is that formula, although I hesitate to even present it:

.FORMULASSEC: SSECNUM

.FORMULA:

.BEGIN NOFILL INDENT 4 SELECT 6 TURN ON "↑↓"

Worth(J) = ||SQRT(SUM R↓i↑2)|| ⊗7x⊗* [ .2⊗7x⊗*Worth(A) + .3⊗7x⊗*Worth(F) + .5⊗7x⊗*Worth(C)]

Where J = job to be judged = (Act A, Facet F, Concept C)
     and {R↓i} are the ratings of the reasons supporting J.

.END

For example, consider  the job J =  (Check examples of Primes).   The
act A would be  "Check", which has a numeric worth of 100.  The facet
F would be "Examples", which has a numeric worth of 700.  The concept
C would  be "Primes", which  at the moment  might have Worth  of 800.
Say  there were four reasons,  having values 200,  300, 200, and 500.
The double lines  "||...||" indicate normalization, which  means that
the final  value of the  square-root must be  between 0 and  1, which
really just  means dividing  the result  of the  Square-root by  1000
always.

In this case, we first compute Sqrt(200↑2 + 300↑2  + 200↑2 + 500↑2) =
Sqrt(420,000), which is about 648.  After normalization, this becomes
0.648.  The expression in square brackets in the formula$$  Namely, [
0.2xWorth(A) + 0.3xWorth(F) + 0.5xWorth(C) ].  $ is actually computed
as  the  dot-product of  two  vectors$$ Namely,  <Worth(A), Worth(F),
Worth(C)> and < .2,  .3, .5 >. The  dot-product of <a1 a2  a3...> and
<b1 b2 b3...> is defined as (a1  x b1) + (a2 x b2) + (a3 x b3) +... $;
in this case it is the dot-product  of (100 700 800) and (.2 .3  .5),
which yields  630. This is  multiplied by the  normalized Square-root
value, 0.648, and we end up with a final priority rating of 408.

The four  reasons  each have  a fairly  low priority,  and the  total
priority of  the task is therefore not  great. It is, however, higher
than any single reason.  This is because there are  many ⊗4distinct⊗*
reasons supporting  it.   The global  formula uniting  these reasons'
values does  not simply take the largest of them (ignoring the rest),
nor does it simply add them up.

The above formula was intended originally as a first  pass, an ⊗4ad hoc⊗*
guess, which I expected I'd have to modify later. Since it has worked
successfully, I have not messed with it.  
There is no reason behind it, no justification for taking dot-products of
vectors, etc.
I concluded, and recent experiments tend to confirm, that the
particular form  of  the formula  is unimportant;  only some  general
characteristics need be present:

.BN

λλ  The task's  value is at  least as  big as  the value of  the best
reason supporting it.

λλ The more distinct reasons that exist, the higher the value.

λλ The  value is  greatly increased  if a new, good, hitherto unknown
reason is perceived.

λλ The value isn't increased if the same reason is present  more than
once. (Like humans, AM would is fooled if
the same reason reappears in disguised form).

.E

.ONCE TURN ON "{}"

I believe that all of these criteria are absolutely essential to good
behavior  of  the system.    Experiments 1,  3,  and 5  bear  on this
question (See Section {[2]RESULT}.{[2]EXPTSSEC}, page {[3]EXPTPAGE}).
These four points are all  intuitively obvious$$ If the reader balks,
he should examine Polya's ⊗4Patterns of Plausible Inference⊗* over and over,
until he
agrees they are intuitively obvious. $.  Note that  the messy formula
given  on   the  last  page  does  incorporate   all  four  of  these
constraints.

.GROUP

Now that we've seen how to compute this priority value for any  given
task, let's not forget what it's used for. The overall rating has two
functions:

.BN

(i) The tasks on the agenda list are ordered by their ratings, and AM
always chooses the top task. Thus this rating determines which task to
execute next. This  is not an ironclad policy: In  reality, AM prints
out  the top few tasks,  and the user has  the option of interrupting
and directing AM to work  on one of those other tasks instead  of the
very top one.

(ii) Once  a task is chosen,  its overall rating  determines how much
time and space AM will expend on it before quitting and moving on  to
a new  task.   The precise  formulae are  unimportant.  Roughly,  the
0-1000 rating is  divided by ten to determine how much time to allow,
in cpu seconds. The  rating is divided by  two to determine how  much
space to allow, in list cells.

.E APART

.SSEC(Heuristics Create New Concepts)

.COMMENT Check this next label -- it might be spurious;

.CREATECON: PAGE;

Recall that a heuristic  rule's actions are of three types:

.BN

λλ Suggest new tasks and add them to the agenda.

λλ Create a new concept.

λλ Fill in some entries for a facet of a concept.

.E

This subsection discusses the second activity.

Here
is the basic idea in a nutshell:
Scattered among the "things to do" in the right-hand-side of a rule
are some requests to create specific new concepts. For each such request,
the heuristic rule
must specify how to construct it. At least, the rule must
specify  ways  of assembling  enough  facets  of the  new  concept to
disambiguate it  from all  the other known  concepts. Typically,  the
rule will explain how to  fill in the Definition of -- or an Algorithm
for -- the new concept.  After executing these instructions, the new
concept will "exist", and a few of its facets will be filled in, and
a few new jobs will probaby exist on the agenda, indicating that AM might want
to fill in certain other facets of this new concept in the near future.

. SSSEC(An Illustration: Discovering Primes)

.COMMENT WARNING: Centering going on -- Don't JUSTIFY!!!;

Here is a heuristic rule  that
results in a new concept being created:

.B816

.ONCE INDENT 4

If the current task was (Fill-in examples of F),

and F is an operation from domain space A into range space B,

and more than 100 items are known examples of A (in the domain of F),

and more than 10 range items (in B) were found by applying F to these
domain items,

and at least  1 of these range items is a distinguished member (esp: an
extremum) of B,

.CPPAGE: PAGE;

.ONCE INDENT 4

Then (for each such distinguished member "b"εB) create the following new concept:

.E


.WBOX(16,8)
MBOX Name: F-Inverse-of-b $
MBOX Definition: λ (x) ( F(x) is b ) $
MBOX Generalization: A $
MBOX Worth: Average(Worth(A), Worth(F), Worth(B), Worth(b), ||Examples(B)||) $
MBOX Interest: Any conjecture involving both this concept and either F or Inverse(F) $
.EBOX

.B816

.ONCE INDENT 4

In case  the user  asks, the  reason for  doing this is:  
"Worthwhile
investigating those A's  which have an unusual F-value, namely, those
whose F-value is b"

.ONCE INDENT 4

The total  amount of time  to spend  right now  on all  of these  new
concepts is computed as: 

Half the  remaining cpu time in  the current
task's time quantum.

.E 

.SKIP 3

Although some examples of ⊗6F-inverse-of-b⊗* might be easily obtained
(or already known) at the moment of its creation, the above rule doesn't specifically
tell AM how to fill in that facet. The very last line of the heuristic indicates
that a few cpu seconds may be spent on just this sort of activity: filling in facets
of the new concept which, though not explicitly mentioned in the rule, are
easy to fill in now. Any facet X which didn't get filled in "right now" will
probably cause a new task to be added to the agenda, of the form: "⊗6Fillin
facet ↓_X_↓ of concept ↓_F-inverse-of-b_↓⊗*".
Eventually, AM would choose that task, and spend a large quantum of time working
on that single facet.

.ONCE TURN ON "{}"

Heuristics for the new concept are quite hard to fill in. This was one of AM's
most serious limitations, in fact (see Chapter {[2]EVALU}).
Above, we see a trivial kind of "heuristic schema" or template, which gets
instantiated to provide one new, specialized heuristic about the new concept.
That new heuristic tells how to judge the interestingness of any conjecture
which crops up involving this new concept. Whenever such conjectures get proposed,
they are evaluated by calling on just such heuristics.

Now let's look at an instance of when this heuristic was used. At one
point,  AM   was  working   on   the  task   "⊗6Fill-in  examples   of
Divisors-of⊗*".

This  heuristic's  IF-part was  triggered because: Divisors-of is  an
operation (from Numbers to Sets  of  numbers), and far more than 100  different
numbers  are  known, and  more than 10  different  sets  of  factors  were  found
altogether,  and some  of them  were  distinguished by  being extreme
kinds of sets: empty-sets, singletons, doubletons  and  tripletons.   

After its
left side triggered, the right side of the heuristic rule was executed.
Namely, four new concepts were created  immediately. Here is one of
them:

.WBOX(12,10)
MBOX Name: Divisors-of-Inverse-of-Doubleton $
MBOX Definition: λ (x) ( Divisors-of(x) is a Doubleton ) $
MBOX Generalization: Numbers $
MBOX Worth: 100 $
MBOX Interest: Any conjecture involving both this concept and either Divisors-of or Times $
.EBOX

This is a  concept representing a  certain class of numbers,  in fact
the  numbers  we  call ⊗4primes⊗*.  The  heuristic  resets a  certain
variable, so that in  case the user  interrupts and asks ⊗4Why?⊗*, AM informs him:

.BEGIN CENTER

"This concept was created because it's worthwhile investigating those numbers which 
have an extreme divisors-of value; in this case, numbers which have only two divisors". 

.E

AM was willing to spend half
the  remaining  quantum of  time  allotted  to  ⊗6"Fillin examples  of
Divisors-of"⊗* on these four new  concepts$$ Some senseless details: One-eighth
of the remaining time is spent
on each of
these 4 concepts: Numbers-with-0-divisors, Numbers-with-α1-divisor,
Numbers-with-2-divisors, Numbers-with-3-divisors.
The
original time/space limits were  in reality about 25 cpu  seconds and
800  list cells, and  at the moment  this heuristic  was called, only
about 10 seconds and 600  cells remained, so e.g. the concept  Primes
was  allotted only  1.2 cpu  seconds  and 80  cells to  "get off  the
ground". This was no problem, as it used far less than that. $.

The heuristic rule is  applicable to any operation, not  just numeric
ones.     For   example,  when   AM  was   filling  in   examples  of
Set-Intersection, it was noticed that some pairs of sets  were mapped
into the  extreme kind of set  Empty-set. The above rule  then had AM
define  the concept of  ⊗4Disjointness⊗*: pairs of  sets having empty
intersection.

. SSSEC(The Theory of Creating New Concepts)

.BEGIN NOFILL END COMMENT this is to cause a break (fix pub bug); 

All
the heuristic  rule must do is  to fill in enough  facets so that the
new concept  is disambiguated  from all  the others,  so  that it  is
"defined" clearly.   Should AM  pause and fill  in lots of  facets at
that  time?  After all, several pieces  of information are trivial to
obtain at this  moment, but may be  hard to reconstruct later  (e.g.,
the  reason why  C  was created).    On the  other  hand, filling  in
anything without a good  reason is a  bad idea (it  uses up time  and
space,  and  it won't  dazzle  the  user  as a  brilliant  choice  of
activity).


So the universal motto of AM is to fill in facets of a new concept if
-- and only  if -- that  filling-in activity will  be much easier  at
that moment than later on.

In almost  all cases, the following  facets$$ The reader may  wish to
review     Section    1.?,    or     glance    ahead    to    Section
{[2]KNOWL}.{[2]FACETSSEC}, page {[3]FACETPAGE} to note the full range
of facets that any concept may possess: what their names are, and the
kind of  information that  is stored  in each.  $ will  be  specified
explicitly in the heuristic  rule, and thus will get  filled in right
away:  Definitions, Algorithms,  Domain/range, Worth,  plus a  tie to
some related concept (e.g., if the new concept is a generalization of
Equality,  then   we  can   trivially  fill  in   an  entry   on  its
Specializations facet: "Equality".)

On  the other hand, the  following facets will  ⊗4not⊗* be trivial to
fill in: Conjectures, Examples, Generalizations, Specializations, and
Interestingness.   For example, filling in  the Specializations facet
of a new concept may involve creating some new concepts; finding some
entries  for its  Conjectures  facet  may  involve a  great  deal  of
experimenting; finding  some Examples of it  may involve twisting its
definition around.  None of these is easier to do at time of creation
than any other time, so it's  deferred until some reason for doing it
exists.

For each such "time-consuming" facet F, of the new concept C, one new
task gets added  to the agenda,  of the form  ⊗6"Fill in entries  for
facet F of concept C"⊗*, with reasons of the form "Because C was just
created," and also "No  entries exist so  far on C.F"$$
C.F is an abbreviation for facet F of concept C
$. Most of  the
tasks generated this way will have low priority ratings, and may stay
near the bottom  of the agenda until/unless they are re-suggested for
a new reason.

Using the Primes example, from the last subsection, we see that a new
task like "⊗6Fillin specializations of Primes⊗*" was suggested with a
low rating,  and "⊗6Fillin examples of Primes⊗*" was suggested with a
mediocre$$ Not as low a rating as the task just mentioned. Why?  Each
possible facet  has a worth rating  which is fixed once  and for all.
As an illustration, we mention that the facet Examples is rated  much
higher than  Specializations.   Why  is this?    Because looking  for
examples of a  concept is often a good expenditure of time, producing
the raw  data on which  empirical induction  thrives.   On the  other
hand, each  specialization of  the new  concept C  would itself  be a
brand new concept. So filling in entries for the Specializations facet
would be a very explosive  process.  $ rating.  The ratings  of these
tasks increase later on, when they're re-proposed for new reasons.

. SSSEC(Another Illustration: Squaring a number)

Let's take  another  simple (but  not atypical)  illustration of how
new concepts get created.
(The reader may skip this subsection; it contains more details about
how AM actually sets up new concepts.)

Assume  that  AM  has  recently  discovered the  concept  of
multiplication, which it calls "TIMES," and AM decides that it  is very
interesting. A heuristic  rule exists which says:$$  By glancing back
at the  Primes example, two subsections ago, page {[3] CPPAGE}, you can imagine
what this rule actually looked like. There is nothing to be gained by
stretching  it out in  all its  glory, hence  I've taken  the liberty
condensing it, inserting pronouns, etc. $

.B816

If a  newly-interesting  operation F(x,y)  takes  a pair  of  N's  as
arguments,

Then create  a new  concept, a specialization of F, 
called  F-Itself, taking  just one N  as
argument, defined as F(x,x), with initial worth Worth(F).

.E

In the  case of F = TIMES,  we see that F takes  a pair of numbers as
its arguments,  so the  heuristic rule  would have  AM  create a  new
concept  called   TIMES-Itself,  defined  as   TIMES-Itself(x)  ⊗6≡⊗*
TIMES(x,x).   That is, create the new  concept which is the operation
of ⊗4squaring a number.⊗*

What would  AM do in  this situation?   The  global list of  concepts
would  be enlarged to  include the  new atom "TIMES-Itself",  and the
facets of  this  new  concept would  begin  to  be filled  in.    The
following facets would get filled in almost instantly:

.WBOX(19,20)
MBOX $
MBOX NAME: TIMES-Itself $
MBOX $
MBOX DEFINITIONS: $
MBOX           	ORIGIN: λ(x,y) [TIMES.DEFN(x,x,y)] $
MBOX $
MBOX ALGORITHMS: λ(x) [TIMES.ALG(x,x)] $
MBOX $
MBOX DOMAIN/RANGE: Number α→ Number $
MBOX $
MBOX GENERALIZATIONS: TIMES $
MBOX $
MBOX WORTH: 600 $
MBOX $
.EBOX

The  name,   definition,   domain/range, generalizations, and worth  are   specified
explicitly by the heuristic rule.

The  lambda  expression  stored  under  the  definition facet  is  an
executable LISP predicate, which accepts two arguments and then tests
them to  see whether the second  one is equal to  TIMES-Itself of the
first argument.   It performs this test by calling upon the predicate
stored under the  definition facet of the  TIMES concept.  
Thus TIMES-Itself.Defn(4,16) will call on TIMES.Defn(4,4,16), and return
whatever value ⊗4that⊗* predicate returns (in this case, it returns True,
since 4x4 does equal 16).

A  trivial
transformation  of  this definition provides  an  algorithm  for computing  this
operation. The algorithm says to call on the Algorithms facet of the concept
TIMES. Thus TIMES-Itself.Alg(4) is computed by 
calling on TIMES.Alg(4,4) and returning ⊗4that⊗* value (namely, 16).

The worth of TIMES was 600 at  the moment TIMES-Itself was created, 
and this becomes the  worth
of TIMES-Itself.

TIMES-Itself  is by  definition  a specialization  of  TIMES, so  the
SPECIALIZATIONS  facet of TIMES is  incremented to point  to this new
concept.  Likewise, the  GENERALIZATIONS facet of TIMES-Itself  points
to TIMES.

Note how easy it  was to fill in these facets  now, but how difficult
it might be later on, "out of context". By way of contrast,
the task of, e.g., filling in
⊗4Specializations⊗* of TIMES-Itself will  be no harder later  on than
it is  right now,  so we may  as well defer  it until there's  a good
reason for it. This task will probably be added to the agenda with so
low a priority that AM  will never get around to it,  unless some new
reasons for it emerge.

The   task  ⊗6"Fill-in   examples  of  TIMES-Itself"⊗*   is  probably
worthwhile doing soon, but  again it won't be any  harder to do at  a
later time than  it is right now.   So it is not done  at the moment;
rather, it is added to the agenda (with a fairly high priority).

Incidentally, the  reader may be interested to know that the next few
tasks AM selected  (in reality)  were to create  the inverse of  this
operation (i.e., integer square-root),  and then to create a new kind
of number, the ones which can be produced by squaring (i.e.,  perfect
squares).   Perfect squares were  deemed worth having  around because
Integer-square-root is ⊗4defined⊗* precisely on that set of integers.

.SSEC(Heuristics Fill in Entries for a Specific Facet)

.HFILSSEC: SSECNUM;

The last two subsections  dealt with how a heuristic rule  is able to
propose  new  tasks  and  create  new  concepts.  This  section  will
illustrate how a rule can find some entries for 
a given facet of a specific concept.

Typically,  the facet/concept involved will be the one mentioned in the
current task which was chosen from the agenda.
If the task "⊗6Fillin Examples of Set-union⊗*" were plucked from the agenda,
then the "relevant" heuristics would be those useful for filling in
entries for the Examples facet of the Set-union concept.

.ONCE TURN ON "{}"

There is an important class of exceptions to this, however: conjectures.
Some rules will specify plausible relationships to look for; if found, they
constitute a new conjecture. For example, the reader will see in Section
{[2] UFTSEC}.{[2]UFTSSEC}.{[2]UFTSSSEC}, on page {[3]UFTPAGE},
that the unique factorization theorem is proposed merely as an
observation of the form "Concept C1 is not only a relation, it is a function".
In fact, this whole conjecture is recorded by merely adding C1 as one new
entry on the Examples facet of the "Function" concept.

The reader may be surprised to learn that the only kind of conjecture AM can
make is of that form (add a new entry to some facet of some concept)$$
That's why "conjecturing" is classified under the "add-an-entry" type of 
heuristic rule action. $.
Apparently, this is sufficient to plausibly notice and state most interesting
conjectures. Good definitions make the statements of theorems short and simple.$$
Exercise for the doubting reader: State the unique factorization theorem in
purely set-theoretic terms. Seriously, one important way that definitions are
↓_invented_↓ is to see what bulky construct in a theorem can be collapsed into a
single term. Typically one hopes that the term will be used elsewhere, of course. $

We'll take these two kinds of "filling in entries" one at a time: first the
standard "find an entry for the facet of the concept mentioned in the current
task", followed by the interesting but rarer activity of "looking for a conjecture".

. SSSEC(An Illustration: α"Fill in Examples of Set-unionα")

Recall that a  task is typically  of the form  ⊗6"Fill in facet  F of
concept  C"⊗*.    How can executing relevant heuristic rules satisfy
such a task?
This  subsection illustrates  how a  heuristic rule
might be executed  to find some entries  for the facet designated  by
the current task.

A typical heuristic, attached to the concept Activity, says:

.GROUP B816

If the current task is to fill in examples of the activity$$ Recall
that  "Activity" is a general concept which includes operations, predicates,
relations, functions, etc. $
F,

One  way to get them is  to run F on  randomly chosen examples of the
domain of F.

.E

.RANDI: PAGE;

.RANDISSEC: SSECNUM;

Of course, in the LISP implementation, this  situation-action rule is
not coded quite so neatly.  It would be more faithfully translated as
follows:

.B816

.ONCE INDENT 4

If CURRENT-TASK matches (FILLIN EXAMPLES F←anything)),

and F isa Activity,

.ONCE INDENT 4

Then carry out the following procedure:

1. Find the domain of F, and call it D;

2.  Find examples of D, and call them E;

3.  Find an algorithm to compute F, and call it A;

4.  Repeatedly:

.INDENT 16,16,0

4a. Choose any member of E, and call it E1.

4b. Run A on E1, and call the result X.

4c. Check whether <E1,X> satisfies the definition of F.

4d. If so, then add <E1 → X> to the Examples facet of F.

4e. If not, then add <E1 → X> to the Non-examples facet of F.

.E APART

Let's take a particular instance where this rule would be useful. Say
the  current  task  is  ⊗6"Fillin  examples  of  Set-union"⊗*.    The
left-hand-side of the  rule is satisfied,  so the  right-hand-side is
run.

Step (1) says to locate  the domain of Set-union. The facet  labelled
Domain/Range, on the Set-union concept, contains the entry (SET SET →
SET), which indicates that the domain is a pair of sets.
That is, Set-union is an operation which accepts (as its arguments) 
a pair of sets, and returns
(as its value) some new set.

Since the domain elements are sets,
step  (2)  says  to  locate  examples  of  sets.  The facet  labelled
Examples, on the Sets concept, points to a list of about 30 different
sets.  This includes {Z}, {A,B,C,D,E}, {}, {A,{{B}}},...

Step  (3) involves  nothing more  than accessing some randomly-chosen
entry on  the Algorithms  facet of  Set-union. One  such entry  is  a
recursive LISP function of two arguments, which  halts when the first
argument is  the empty set, and otherwise it  pulls an element out of
that set  and SET-INSERT's  it  into the  second argument,  and  then
recurses on the  new values of the two sets.   For convenience, we'll
refer to this algorithm as UNION.

We then enter the loop of Step (4).  Step (4a) has us choose one pair
of our examples of sets, say the first two {Z} and {A,B,C,D,E}.  Step
(4b) has us run UNION on these two sets. The result is {A,B,C,D,E,Z}.
Step (4c)  has  us  grab  an entry  from  the  Definitions  facet  of
Set-union, and run it. A typical definition is this formal one:

.BEGIN NOFILL SELECT 6 INDENT 3,3,3

(λ (S1 S2 S3) 
	(AND
 		(For all x in S1, x is in S3)
		(For all x in S2, x is in S3)
		(For all x in S3, x is in S1 or x is in S2)
	)
)  )


.E

It   is  run   on  the   three   arguments  S1={Z},   S2={A,B,C,D,E},
S3={A,B,C,D,E,Z}.  Since it returns "True"$$ Actually, it returns the
value of the final "For all...", which we may assume is "True". $, we
proceed  to  Step   (4d).    The  construct   <{Z},  {A,B,C,D,E}  →
{A,B,C,D,E,Z}> is added to the Examples facet of Set-union.

At this stage, control returns to the beginning of the Step (4) loop.
A new pair of sets is chosen, and so on.

But when would this loop  stop? Recall that each task has a time and a space
allotment (base  on its priority value). 
If there are many different rules who claim to be relevant to the current task,
then each one is allocated a small fraction of those time/space quanta.
When either of these resources is exhausted,
AM would break away  at a "clean" point (just
after finishing a cycle of the Step (4) loop) and would move on to
a new heuristic rule for filling in examples of Set-union.

This concludes the demonstration  that a heuristic rule really can be
executed to produce the  kinds of entities  requested by the  current
task.     In  this  sense,   the  rules  behave   like  goal-directed
(partially data-directed) function calls.

. SSSEC(Heuristics Propose New Conjectures)

We saw in the sample excerpt (Chapter 2) that AM occasionally notices
some  unexpected  relationship,  and  formulates  it  into a  precise
conjecture.  Below is an example of  how this is done.  As you  might
guess from the placement of this subsection,$$
or recall from the opening remarks of Section {SECNUM}.{[2]HFILSSEC} $
the mechanism is our old
friend the heuristic rule which fills in entries for certain facets.

In fact, a conjecture evolves through four stages:

.BN

λλ  A heuristic  rule looks  for a  particular kind  of relationship.
This will typically be of the form "X is a Generalization of Y", or "X is
an example of Y", or "X is the same as Y", or "C1(X,Y)" where C1 is a predicate.

λλ Once found, it is checked, using supporting contacts.
A great deal of empirical evidence must favor the relationship,
and any contradictory evidence must be "explained away" somehow.

λλ Now it is believed, and  AM prints it out to the user. 
It is added as a new entry to the Conjecs facet of both concepts X and Y.
It  is also
added as an entry to the Examples facet of the Conjecture concept.

λλ Eventually,  AM will get around  to the task  "⊗6Check Examples of
Conjecture⊗*",
or to the task "⊗6Check Conjecs of X⊗*".
If AM had any concepts for proving conjectures,  they
would then be invoked. In the current LISP implementation, these are 
absent.
Nevertheless, several "checks" are performed: (i) see if any new empirical
evidence (pro or con) has appeared recently; (ii) see if this conjecture can
be strengthened; (iii) check it for extreme cases, and modify it if necessary;
(iv) Modify the worth ratings of the concepts involved in the conjecture.

.E

The left-hand-side of such  a heuristic rule will be  longer and more
complex  than  most other  kinds,  but the  basic  activities  of the
right-hand-side will still  be filling in an  entry for a  particular
facet. 

The entries filled in will include:
(i) a   new  example   of  Conjectures,
(ii) a new Conjec of each concept involved in the conjecture,
(iii) if we're claiming that
concept X is a generalization of concept Y, then "X" would be added to the
Generalizations facet of Y, and "Y" added to the Specializations facet of X,
(iv) if X is an Example of Y, 
"X" is added to the Examples facet of Y, and "Y" is added to the ISA facet of X.

The
right-hand-side  may also  involve  adding new  tasks to  the agenda,
creating new concepts, and modifying entries of  particular facets of
particular concepts.  As usual, both  sides of such  a heuristic rule
may run any little functions they  want to: any functions which are quick and  have
no side effects (e.g., FORALL tests,  PRINT functions, accesses to
a specified facet of some concept).

. SSSEC(An Illustration: α"All primes except 2 are oddα")

As an illustration, here is a heuristic  rule, relevant when checking
examples of any concept:

.GROUP B816 ONCE INDENT 4

If the current task is to Check Examples of X,

and (Forsome Y) Y is a generalization of X,

and Y has at least 10 examples,

and all examples  of Y (ignoring boundary cases) are also examples of
X,

.SHPAGE: PAGE;

.ONCE INDENT 4

Then print the following conjecture: X is really no  more specialized
than Y,

and add it to the Examples facet of Conjectures,

and if the user asks, inform him  that the evidence for this was that
all  ||Examples(Y)|| Y's  (ignoring boundary examples  of Y's) turned
out to be X's,

and Check the truth of this conjecture on boundary examples of Y,

and add "X" to the Generalizations facet of Y,

and add "Y" to the Specializations facet of X,

and (if there is an entry in the Generalizations facet  of Y) add the
following task to  the agenda "Check examples of  Y", for the reason:
"Just as Y was no more general than X, one-of Generalizations(Y)  may
turn out to be no more general than Y", with a rating for that reason
computed       as:       0.4x||Examples(Generalizations(Y))||       +
0.3x||Examples(Y)|| + 0.3xPriority(Current task).

.E APART

Let's take a particular instance where this rule would be useful. Say
the  current  task  is  ⊗6"Check  examples  of  Odd-primes"⊗*.    The
left-hand-side  of  the  rule  is  run,  and  is  satisfied when  the
generalization Y  is the  concept "Primes".   Let's see  why this  is
satisfied.

One of entries of the Generalization facet of Odd-primes is "Primes".
AM grabs hold of the 30  examples of primes (located on the  Examples
facet of Primes), and  removes the ones which are  tagged as boundary
examples  ("2" and  "3").    A definition  of  Odd-prime  numbers is
obtained (Definitions facet  of Odd-primes),  and it is  run on  each
remaining example  of primes (5, 7,  11, 13, 17, ...).   Sure enough,
they  all satisfy the  definition.  So all  primes (ignoring boundary
cases)  appear  to be  odd.    The  left-hand-side  of  the  rule  is
satisfied.

At this  point, the user  sees a message  of the form  "Odd-primes is
really no more specialized than Primes".   If he interrupts and  asks
about it,  he is  told that  the evidence  for this was  that all  30
primes  (ignoring  boundary  examples of  primes)  turned  out to  be
Odd-primes.

Of the boundary examples (the numbers 2 and 3), only  the integer
"2" fails  to be  an odd-prime, so  the the user  is notified  of the
finalized   conjecture:  "All  primes  (other   than  `2')  are  also
odd-primes".   This is added  as an  entry on  the Examples facet  of
Conjectures.

Before  beginning all this,  the Generalizations facet  of Odd-primes
pointed to Primes. Now, this rule has us add "Primes" as an entry  on
the  Specializations facet  of  Odd-primes.  Thus  Primes is  both  a
generalization and a specialization of Odd-primes (to within a single
stray exception), and AM will be able to treat these two  concepts as
if they were merged together.  They are still kept separate, however,
in  case AM ever needs to know  precisely what the difference between
them is, or in case later evidence shows the conjecture to be false$$
When  space  is  exhausted, one  emergency  measure  AM  takes is  to
destructively coalesce  a pair  of concepts  X,Y where  X is  both  a
generalization of and a specialization of Y, so  long as there are no
"boundary" exceptions. $.

The final action  of the right-hand-side of this rule is to propose a
new task (if there exist some generalizations of the concept Y, which
in our case  is "Primes").  So AM accesses  the Generalizations facet
of Primes, which is "(Numbers)". A new task is therefore added to the
agenda: ⊗6"Check  examples of Primes"⊗*,  with an associated  reason:
"Just as Primes  was no more general than  Odd-primes, so Numbers may
turn out to  be no  more general than  Primes". The  reason is  rated
according to  the formula given  in the rule;  say it gets  the value
500.

To  make this example a  little more interesting,  let's suppose that
the task  ⊗6"Check  examples  of  Primes"⊗* already  existed  on  the
agenda, but for the reason "Many examples of Primes have been found,
but never  checked", with a rating for the reason of 100, and for the
task as a whole of 200.  The global  task-rating formula then assigns
the task  a new overall priority  of 600, because of  the new, fairly
good reason supporting it.

When that  task is  eventually chosen,  the heuristic  rule  pictured
above (at the beginning  of this subsection) will trigger  and be run
again, with  X=Primes and Y=Numbers. That is,  AM will be considering
whether (almost) all numbers are  primes.  The left-hand-side of  the
heuristic rule will  quickly fail, since, e.g., "6" is  an example of
Numbers which does not satisfy the definition of Primes.

. SSSEC(Another illustration: Discovering Unique Factorization)

.UFTSEC: SECNUM;

.UFTSSEC: SSECNUM;

.UFTSSSEC: SSSECNUM;

.UFTPAGE: PAGE;

Below is  a heuristic  rule which  is a  key agent in  the process  of
"noticing"  the   fundamental  theorem  of  arithmetic$$  The  unique
factorization conjecture: any positive  integer n can be  represented
as  the product  of prime  numbers in  precisely one  way (to  within
reorderings  of those prime factors).  Thus 28 =  2x2x7, and we don't
distinguish between the factorization  (2 2 7) and  (2 7 2). $.
(The reader may skip this subsection; it contains more details about
how AM actually proposed that interesting conjecture).

The first conjunct on the IF-part of the heuristic rule indicates that it's
relevant  to checking examples  of any  given operation F.  A typical
example is selected at random, say F(x)=y. Then y is examined, to see
if it satisfies any more stringent properties than those specified in
the  Domain/range facet of F.   That is, the  Domain/range facet of F
contains an entry of the form A→B;  so if x is an A, then all  we are
guaranteed about y  is that it is an  example of a B.   But now, this
heuristic  is asking  if y  isn't really  an example  of a  much more
specialized concept than B.  If it is (say it's an  example of a BB),
then the  rest of the examples of  F are examined to see  if they too
satisfy this same property. If all examples appear to map from domain
set A into  range set BB (where BB  is much more restricted  than the
set  B specified originally in  the Domain/range facet of  F), then a
new conjecture is  made: the domain/range  of F  is really A→BB,  not
A→B.  Here is that rule, in crisper notation:

.GROUP B816 ONCE INDENT 4

If the current task is to Check Examples of the operation F,

and F is an operation from domain A into range B,

and F has at least 10 examples,

and a typical one of these examples is "<x→y>" (so xεA and yεB),

and (Forsome Specialization BB of B), y is a BB.

and ↓_all_↓  examples of F (ignoring  boundary cases) turn out  to be
BB's,

.ONCE INDENT 4

Then print the following conjecture: "F(a) is always a BB, not simply
a B",

and add it to the Examples facet of Conjectures concept,

and add "<A  → BB>" as a  new entry to  the Domain/range facet of  F,
replacing "<A → B>",

and if the user asks,  inform him that the evidence for this was that
all ||Examples(F)|| examples of F (ignoring boundary examples) turned
out to be BB's,

and check  the  truth of  this conjecture  by running  F on  boundary
examples of A.

.E APART

Let's see how this  rule was used in one instance. In Task 79 (in the
sample excerpt  in Chapter  2), AM  defined  the concept  Prime-times,
which was  a function transforming any  number n into the  set of all
factorizations  of n into primes.  For example, Prime-times(12)={(2 2
3)}, Prime-times(13)={(13)}.   The  domain of  F=Prime-times was  the
concept Numbers.  The range was  Sets. More precisely,  the range was
Sets-of-Bags-of-Numbers, but  AM didn't  know  that concept  at  that
time.

The above  heuristic rule was  applicable. F  was Prime-times, A  was
Numbers, and B was  Sets.  There were far more than 10 known examples
of Prime-times  in action.  A typical  example was  this  one: <21  →
{(3,7)}>.    The  rule  now  asked   that  {(3,7)}  be  fed  to  each
specialization  of  Sets,  to  see  if  it  satisfied  any  of  their
definitions.  The  Specializations facet of  Sets was acccessed,  and
each  concept pointed  to was  run (its  definition was  run)  on the
argument   "{(3,7)}".      It   turned   out   that   Singleton   and
Set-of-doubletons were the only two specializations of Sets satisfied
by this example.   At this moment, AM had narrowed down the potential
conjectures to these two

.SKIP 1 BN

λλ Prime-times(x) is always a singleton set.

λλ Prime-times(x) is always a set of doubletons.

.E

Each example  of Prime-times  was examined,  until one  was found  to
refute   each  conjecture   (for   example,  <8→{(2,2,2)}>   destroys
conjecture 2).   But no example was able to disprove conjecture 1. So
the heuristic rule  plunged forward, and printed  out to the user  "A
new conjecture: Prime-times(a)  is always a singleton-set, not simply
a set".   The  entry <Numbers  → Singleton-sets  > was  added to  the
Domain/range facet of Prime-times, replacing the old entry <Numbers →
Sets>.

.ONCE TURN ON "{}"

Let's  digress for a  moment, discuss  the robustness of  the system.
What if this  heuristic were to  be excised?  Could AM still  propose
unique factorization?   The answer  is yes,  there are other  ways to
notice  it. If  AM has  the concept  of a Function$$  A single-valued
relation.  That is, for any domain element x, F(x) contains precisely
one  member.  It  is never  empty (i.e., undefined),  nor is  it ever
larger than a singleton (i.e., multiple-valued). $, then a  heuristic
rule like the one in the previous subsection (Page {[3] SHPAGE}) will
cause AM  to ask if Prime-times is not merely  a relation, but also a
Function.

The past few sections should have convinced the reader  that isolated
heuristic rules really can do all kinds of things: propose new tasks,
create   new   concepts,  fill   in   entries  for   specific  facets
(goal-driven),  and  look  for  conjectures   (data-driven  empirical
induction).   The rules appear  fairly general$$ i.e.,  applicable in
many situations. It  would be worse  than useless if  a rule  existed
which could lead  to a single  discovery like "Fibonacci  series" but
never  lead  to  any other  discoveries.  The  reasons for  demanding
generality are not only "fairness", but the insights which occur when
it is observed that several  disparate concepts were all motivated by
the  same general principle (e.g., "looking  for the inverse image of
extrema"). $ -- though that must be later verified empirically.  They
are  redundant  in a  pleasing  way: some  of  the most  "important",
well-known, and interesting conjectures  can (apparently) be  derived
in many ways.  Again, we'll have to check this experimentally.

.SSEC(Gathering Relevant Heuristics)

Each concept has facets which contain some heuristics.  Some of these
are  for  filling  in,  some  for  checking,  some for  deciding  the
interestingness$$ The  reader  has already  seen  several  heuristics
useful for  filling in and checking  facets; here is one  for judging
interestingness: an  entry on the Interest facet of Compose says that
a composition AoB  is more interesting if  the range of B  equals the
domain of A. $, some for noticing new conjectures, etc.

AM contains hundreds of these heuristics.  In order to save time (and
to make AM appear more rational), each heuristic should only be tried
in situations where it might apply, where it makes sense.

How is AM  able to zero  in on the  relevant heuristic rules,  once a
task has been selected from the agenda?

. SSSEC(Domain of Applicability)

The  secret  is that  each  heuristic rule  is  stored  somewhere ⊗4a
propos⊗* to  its "domain of  applicability". This  "proper place"  is
determined by the first conjunct in the left-hand side of the rule.

What does this mean?  Consider this heuristic:

.GROUP B816

.ONCE INDENT 4 TURN OFF "∂"

If the current task  is to fill in examples of the  operation F, ⊗8     <====⊗*

and some examples of the domain of F are known,

.ONCE INDENT 4

Then  one way to  get examples  of F is  to run F  on randomly chosen
examples of the domain of F.

.E APART

This is a reasonable thing to try -- but  only in certain situations.
Should it be tried when  the current task is to check the Worth facet
of the Sets concept? No, it would be irrational.  Of course, even  if
it were tried then, the left-hand-side would  fail very quickly.  Yet
⊗4some⊗*  cpu  time  would  have been  used,  and  if  the user  were
watching, his opinion of AM would decrease.

That particular heuristic has  a precise domain of applicability:  AM
should use it whenever the current task  is to fill in examples of an
operation, and only in those kinds of situations.

The  key observation is  that a heuristic typically  applies to ⊗4all
examples  of  a  particular   concept  C⊗*.  In  the  case   we  were
considering,  C=Operation.    Intuitively,  we'd  like to  tack  that
heuristic onto the  Examples facet  of the concept  Operation, so  it
would only "come to mind" in appropriate situations.  This is in fact
precisely where the heuristic rule ⊗4is⊗* stored.

Initially, the author identified the proper concept C and facet F for
each heuristic H which AM  possessed, and tacked H onto C.F$$  Recall
that C.F is an abbreviation for facet F of concept C $.  This was all
preparation, completed  long before AM started up.  Each heuristic was
tacked  onto  the  facet  which  uniquely  indicates  its  domain  of
applicability.   The first conjunct of the  IF-part of each heuristic
indicates where it is stored and  where it is applicable. Notice  the
little arrow (⊗8<==⊗*) pointing   to  that  conjunct  above.$$  In   the  LISP
implementation,  these  first conjuncts  are omitted,  since  the
⊗4placement⊗* of a  heuristic serves the  same purpose  as if it  had
some  "pre-preconditions" (like  these  first  conjuncts) to 
determine relevance quickly. $.

While AM is running, it will choose a task dealing with, say, facet F
of concept X.  AM  must quickly locate the heuristic rules  which are
relevant  to  satisfying that  chosen  task.  AM  simply locates  all
concepts which  claim X  as an  example.   If the  current task  were
"⊗6Check  the   Domain/range  of  Intersect⊗7o⊗6Intersect⊗1"$$   This
operation is defined as: ⊗1Intersect⊗7o⊗1Intersect⊗7(x,y,z) ≡ (x ∩ y)
∩ z.  It accepts  3 sets as arguments, and  returns a new set as  its
value.  $,  then X would be Intersect-o-Intersect,  and such concepts
would  include  Compose-with-Self,  Composition,  Operation,  Active,
Any-concept, and Anything.   AM then  collects the heuristics  tacked
onto facet F (in this case, F would be Domain/range) of each of those
concepts. All such heuristics will be relevant. In the current  case,
some  relevant heuristics  might be  garnered  from the  Domain/range
facet  of  the  concept  Operation.  All  those heuristics  would  be
relevant   for    dealing   with    the   Domain/range    facet    of
Intersect-o-Intersect.

So the whole problem of locating relevant heuristics has been reduced
to the problem  of efficiently finding all concepts of  which X is an
example (for a given concept X).  This process is called  ⊗4"rippling
away from  X in the  ISA direction"⊗*, and  forms the subject  of the
next subsection.   

. SSSEC(Rippling)

Given a concept C, how can AM find all  the concepts which claim C as
an example?

The most obvious  scheme is to store this information explicitly.  So
the Examples facet of C would  point to all known examples of C,  and
the Isa facet  of C would point  to all known concepts  claiming C as
one of their examples.  Why  not just do this?   One can substitute a
modest amount of processing  time (via chasing links around)  for the
vast amount of storage space that would be needed to have "everything
point to everything".

.GENLSPEC: PAGE

Each facet contains only enough pointers so that the entire graph  of
Exs/Isa and Spec/Genl links could be reconstructed  if needed.  Since
"Genl"$$
"Genl" is an abbreviation for the Generalizations facet of a concept;
similarly, "Spec" means Specializations, Exs means Examples, etc.
"Isa" is the converse facet to Exs; i.e., Aε B.Exs iff Bε A.Isa.
Saying "Genl is transitive" just means the following: if A is a
generalization of B, and B of C, then A is also a generalization of C".
$is a transitive  relation, AM can  compute that  Numbers is a
generalization of Mersenne-primes, if the facet  Mersenne-primes.Genl
contains  the  entry  "Odd-primes", and  Odd-primes.Genl  contains  a
pointer to  "Primes", and Primes.Genl points to "Numbers".  This kind
of "⊗4rippling⊗*" activity is used to efficiently locate all concepts
related  to a given  one X.  In particular, AM  knows how  to "ripple
upward in  the Isa  direction", and  quickly$$ With  about 200  known
concepts, with each Isa facet and each Genl facet pointing to about 3
other  concepts, about  25 links  will  be traced  along in  order to
locate about a dozen final  concepts, each of which claims the  given
one as an example.  This whole rippling process, tracing 25 linkages,
uses  less than .01  cpu seconds,  in compiled Interlisp,  on a KI-10
type PDP-10.  $ locate all  concepts which  claim X  as one of  their
examples.

.ONCE TURN ON "{}"

It  turns out  that AM  cannot simply  call for  X.Isa, then  the Isa
facets of those concepts, etc., because Isa is not transitive$$ If  x
isa y,  and y  isa z,  then x  is (generally) ↓_NOT_↓ a  z. This is  due to  the
intransitivity of  "member-of".  Generalization is transitive, on the
other hand, because "subset-of" is transitive. $.  For the interested
reader, the algorithm  AM uses to collect Isa's of  X is given below.
For  the ⊗4very⊗* interested reader, it  is explained in great detail
in appendix {[2]RIPPLE}.

.BN ONCE PREFACE 1

λλ All  generalizations  of the  given  concept X  are located.    AM
accesses X.Genl, then the Genl facets of ⊗4those⊗* concepts, etc.

λλ The "Isa" facet of each of those concepts is accessed.

λλ AM  locates all generalizations of  these newly-found higher-level
concepts.  This is the  list of all known  concepts which claim X  as
one of their examples.

.E

.GIGPAGE: PAGE;

.ONCE TURN ON "{}"

In regular form, one might express this rippling recipe more compactly as:
⊗6Generalizations↑*(Isa(Generalizations↑*(X)))⊗*.  There is not much need
for a detailed understanding  of this process,  hence it will not  be
delved into further in the body of this thesis.  Appendix {[2]RIPPLE}
contains more than anyone would want to know about rippling.

. SSSEC(Ordering the Relevant Heuristics)

Now that  all  these relevant  heuristics have  been assembled,  in
what
order should AM  execute them?$$ The discussion below assumes that
the heuristics don't  interact with each other;  i.e., that each  one
may  act  independently  of  all   others.    The  validity  of  this
simplification  is  tested empirically  (see Chapter  {[2]RESULT}) and
discussed  theoretically (see  Chapter  {[2]EVALU})  later. $  It  is
important  to  note that  the  heuristics  tacked  onto very  general
concepts  will  be  applicable  frequently,  yet  will  not  be  very
powerful.   For example, here  is a typical  heuristic rule  which is
tacked  onto   the  Examples  facet  of   the  very  general  concept
Any-concept:

.B816

If the current task is to fill in examples of any concept X,

Then one way to get them is to symbolically instantiate$$
"Symbolic instantiation" is a euphemism for a bag of tricks which transform
a declarative definition of a concept into particular entities satisfying
that definition. 
The only constraint on the  tricks is that they not actually ↓_run_↓ the definition.
One such trick might be: if the definition is recursive,
merely find some entity that satisfies the base step. AM has not broken any
new ground in this area, hence this will not be covered any more deeply here.
The interested reader is directed to [ref].
$ a definition of X.

.E

It takes a tremendous amount of inference to squeeze a couple awkward
examples  of  Intersect-o-Intersect  out  that  concept's definition.
Much time could be  wasted doing so$$ Incidentally,  this illustrates
why  no  single  heuristic   should  be  allowed  to  monopolize  the
processing of any one task.$.

.APART GROUP

Just as  general heuristics  are weak  but often  relevant,  specific
heuristics are powerful but rarely relevant.  Consider this heuristic
rule,   which    is   attached   to   the   very   specific   concept
Compose-with-Self:

.B816

If the current task is to fill in examples of the composition F-o-F,

Then include any fixed-points of F.

.E APART

For example,  since  Intersect(⊗4phi⊗*,X)  equals  ⊗4phi⊗*,  so  must
Intersect-o-Intersect(⊗4phi⊗*,X,Y).$$ ⊗4phi⊗1  ⊗7is another  name for
the empty  set, also written α{α}.  This last sentence thus says that
since α{α} ∩ X = α{α},  then (α{α} ∩ X) ∩ Y must also  equal α{α}. $.
Assuming  that  such   examples  exist  already  on  Intersect,  this
heuristic will fill in a  few examples of Intersect-o-Intersect  with
essentially  no  processing  required.    Of  course  the  domain  of
applicability of this heuristic is minuscule.

As  we expected, the narrower  its domain of  applicability, the more
powerful and efficient a heuristic  is, and the less frequently  it's
useful.   Thus  in any  given situation,  where AM  has gathered  many
heuristic  rules,  it  will  probably be  best  to  execute  the most
specific ones first, and execute the most general ones last.

Below are summarized the three  main points that make up  AM's scheme
for  finding relevant heuristics  in a  "natural" way and  then using
them:

.BN PREFACE 1

λλ Each heuristic is tacked onto the  most general concept  for
which it applies: it is given as large a domain of applicability
as possible$$ This  will maximize its generality, but leave its power
untouched.   This brings  it  closer to  the "ideal"  tradeoff  point
between these two quantities. $.

λλ When the current task deals with concept C, AM ripples away from C
and
quickly locates
all the concepts of which C is an example.  Each of them will contain
heuristics relevant to dealing with C.

λλ AM  then applies those heuristics  in  order of  increasing
generality$$  You  may  wonder   how  AM  orders  the  heuristics  by
generality. It  turns out  that  the rippling  process  automatically
gathers heuristics in  order of increasing  generality.  In  the LISP
system, each rule is therefore executed as soon as it's found.  So AM
nevers  wastes time  gathering  heuristics  it  won't  have  time  to
execute.$.

.E